(WIP) Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU by amepas · Pull Request #434 · AI-Hypercomputer/maxdiffusion

amepas · 2026-06-29T20:49:38Z

(WIP) - this will be updated with multi-chip latency and support for Flux2 Klein 9B!

Draft PR for the Flux2 Klein model. Includes a custom implementation of the Qwen3-4B model for getting text embeddings. VAE Decoder, RoPE positional embedder, flow-matching step schedule are all re-used. Light modifications to transformer/attention blocks are used.

Latency for batch-size 4 of 1024 by 1024 images (bfloat16):

Prompt Encoding (Qwen3): 57.67 ms (1.58% of total)
Denoising Loop (Flux 4 steps): 3,181.20 ms (87.09% of total)
- Per-Step Transformer Time: 795.30 ms
VAE Decoding (VAE): 413.77 ms (11.33% of total)
Total: 3.65 seconds

PR includes code for verifying accuracy of implementation. Sharding model is implemented but not tested.

Image generation is only supported so far.

github-actions · 2026-06-29T20:49:45Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

entrpn · 2026-06-30T00:14:32Z

+
+class GenerateFlux2KleinE2ETest(unittest.TestCase):
+
+    def test_end_to_end_parity_and_offloading(self):


this test is very likely to fail in the github runner with the hardcoded values. We usually don't run e2e tests on the github runner, you can mark it so it doesn't run in the runner.

entrpn · 2026-06-30T00:14:41Z

+        every single stage against the golden PyTorch reference.
+        """
+        # Set highest precision for strict mathematical parity checks
+        jax.config.update("jax_default_matmul_precision", "highest")


what is the reason for using highest here?

Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU

379c1a0

amepas requested review from chandrasekhard2 and eltsai June 29, 2026 20:49

amepas requested a review from entrpn as a code owner June 29, 2026 20:49

amepas marked this pull request as draft June 29, 2026 20:53

chandrasekhard2 requested review from Perseus14 and mbohlool June 29, 2026 23:16

entrpn reviewed Jun 30, 2026

View reviewed changes

entrpn requested changes Jun 30, 2026

View reviewed changes

amepas changed the title ~~Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU~~ (WIP) Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU Jun 30, 2026

amepas added 3 commits June 30, 2026 18:00

Fixes to support FSDP

a86a6fe

minor tweaks for hf_cache location

59ee14a

more tiny fixes

20427f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(WIP) Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU#434

(WIP) Cleaned Flux2 Klein Implementation, with benchmarking done on v6 TPU#434
amepas wants to merge 4 commits into
mainfrom
flux2klein-onboarding

amepas commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

entrpn Jun 30, 2026

Uh oh!

entrpn Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		class GenerateFlux2KleinE2ETest(unittest.TestCase):

		def test_end_to_end_parity_and_offloading(self):

Uh oh!

Conversation

amepas commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

entrpn Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

entrpn Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amepas commented Jun 29, 2026 •

edited

Loading